330 PART 6 Analyzing Survival Data
remarkably predictable shapes or distributions (the most common being the
Weibull distribution, covered in Chapter 24). Because of this, these disciplines
often use a parametric form of survival regression, which assumes that you
can represent the survival curves by algebraic formulas. Unfortunately for
biostatisticians, biological data tends to produce nonparametric survival curves
whose distributions can’t be represented by these parametric distributions.
As described earlier, nonparametric survival analyses using life tables, Kaplan-
Meier plots, and log-rank tests are limiting. But as biostatisticians, we could not
rely on using parametric distributions in our models; we wanted to use a hybrid,
semi-parametric kind of survival regression. We wanted one that was partly non-
parametric, meaning it didn’t assume any mathematical formula for the shape of
the overall survival curve, and partly parametric, meaning we could use some
parameter (or predicted survival distribution shape) to guide our formulas the
way other industries used the Weibull distribution. In 1972, a statistician named
David Cox developed a workable method for doing this. The procedure is now
called Cox proportional hazards regression, which we call PH regression for the rest of
this chapter for brevity. In the following sections, we outline the steps of per-
forming a PH regression.
Since 1972, many issues have been identified when using survival regression for
biological data, especially with respect to its appropriateness for the type of data.
One way to examine this is by running a logistic regression model (see Chapter 18)
with the same predictors and outcome as your survival regression model without
including the time variable, and seeing if the interpretation changes.
The steps to perform a PH regression
You can understand PH regression in terms of several conceptual steps, although
when using statistical software like is described in Chapter 4, it may appear that
these steps take place simultaneously. That is because the output created is
designed for you — the biostatistician — to walk through the following steps in
your mind and make decisions. You must use the output to:
1.
Determine the shape of the overall survival curve produced from the
Kaplan-Meier method.
2.
Estimate how your hypothesized predictor variables may impact the
bends in this curve — in other words, in what ways your predictors may
affect survival.